Analysis the molecular similarity of least common amino acid sites in ACE2 receptor to predict the potential susceptible species for SARS-CoV-2

SARS-CoV-2 infections in animals have been reported globally. However, the understanding of the complete spectrum of animals susceptible to SARS-CoV-2 remains limited. The virus’s dynamic nature and its potential to infect a wide range of animals are crucial considerations for a One Health approach that integrates both human and animal health. This study introduces a bioinformatic approach to predict potential susceptibility to SARS-CoV-2 in both domestic and wild animals. By examining genomic sequencing, we establish phylogenetic relationships between the virus and its potential hosts. We focus on the interaction between the SARS-CoV-2 genome sequence and specific regions of the host species’ ACE2 receptor. We analyzed and compared ACE2 receptor sequences from 29 species known to be infected, selecting 10 least common amino acid sites (LCAS) from key binding domains based on similarity patterns. Our analysis included 49 species across primates, carnivores, rodents, and artiodactyls, revealing complete consistency in the LCAS and identifying them as potentially susceptible. We employed the LCAS similarity pattern to predict the likelihood of SARS-CoV-2 infection in unexamined species. This method serves as a valuable screening tool for assessing infection risks in domestic and wild animals, aiding in the prevention of disease outbreaks.


Introduction
Corona virus disease 2019 (COVID-19) is a highly contagious zoonoses caused by the Severe Acute Respiratory Syndrome Corona virus 2 (SARS-CoV-2).Since the first detection in Wuhan in December 2019, COVID-19 has rapidly spread globally [1], but the origin of the coronavirus is still unknown.Bats and pangolins have been considered possible natural hosts for SARS-CoV-2, but there is no conclusive evidence [2,3].The range of SARS-CoV-2 hosts not only humans but also expanding other mammals such as pet cats and minks, which were infected in March and April of 2020 in Belgium and Spain, respectively [3,4].Subsequently, SARS-CoV-2 infection was detected in ferrets, dogs, golden hamsters, white-tailed deer, rhesus macaques, tigers, lions and so on, as reported by the World Organization for Animal Health (WOAH) [5,6].An increasing number of mammals are infected with the new coronavirus, indicating the risk of cross-species transmission of SARS-CoV-2.Cross-species transmission of SARS-CoV-2 may lead to the evolution of new hosts and further spread of the virus [7].This poses a serious threat to global public health and biodiversity.
The SARS-CoV-2 viral genome specifically binds to receptors on the surface of host cells, which is a key link in viral infection [8].So far, the virus has been infecting new species consisting of a specific homologous target receptor capable of binding the SARS-CoV-2 genome.The recognition of SARS-CoV-2 receptors is an important determinant of its transmission between species [9][10][11].The specific receptor of the new coronavirus is angiotensin-converting enzyme 2 (ACE2), which is widely expressed in animals as a cell surface receptor.The abundance of ACE2 receptors in any organs of the body, including the brain, heart, kidney, nasopharynx, lymph nodes, small intestine, colon, stomach, thymus, skin, spleen, bone marrow, liver, blood vessels, and oral and nasal mucosa, renders them susceptible to infection by SARS-CoV-2 [12][13][14].
The researcher has extensively studied SARS-CoV-2 in order to determine its host range [15,16].However, animals at high risk of contracting SARS-CoV-2 cannot be accurately predicted by phylogenetic relationships based on comparisons of the entire ACE2 gene [15,17].In-Vivo experiments animal infection provide the best opportunity to understand the susceptibility of SARS-CoV-2 across mammals [18].However, conducting In-Vivo studies on a wide array of animals, particularly wildlife, presents a considerable complexity demanding increased manpower and resources.Additionally, ethical concerns arise when performing experiments on the diverse range of wild animals.Therefore, our attention has been turned to the analysis of the key binding domain of ACE2 to SARS-CoV-2 to predict the high-risk susceptible animals [10,[19][20][21][22][23][24].The analysis of receptor similarity methods is often used to predict the transmission of the virus between species [25].Myeongji Cho's sequence-based approach suggests that it may be possible to identify virus transmission between hosts without requiring complex structural analysis [17].This method has been used to study the host range of the new coronavirus by predicting the homology of receptor key amino acid sequences, and key binding site methods [15,16,26,27].On this basis, we proposed a new screening approach that involved screening and combining the important Last Common Amino acid Sites (LCAS) in ACE2 from known susceptible hosts, which served as a standard method to evaluate the risk of SARS-CoV-2 infection with unknown species.It can be used as a screening tool and has important scientific implications for discovering potential susceptible hosts of the SARS-CoV-2 virus and assessing its possible transmissibility across species.

ACE2 receptor sequence collection
The protein sequences of ACE2 from mammalian species were gathered from the National Center for Biotechnology Information (NCBI) Protein Database (https://www.ncbi.nlm.nih.gov/) and Uniprot (UniProt).Queried for records containing "ACE2" as gene name and "Mammalia" as taxonomic class.Next, for selection by taxon, one complete ACE2 amino acid sequence per species was retained and extracted in FASTA format.Then, for sequence files, protein IDs were renamed as follow: ACE2_NCBI gene accession ID_ Species name.

ACE2 receptor data processing
The downloaded sequence file in FASTA format was imported into MAFFT [32] for sequence alignment and duplicate sequences were removed.Output in the same FASTA format.Then import the aligned sequences into BioEdit [33].Find the human ACE2 receptor sequence in the sequence file and drag it to the first line.Using the human ACE2 sequence as a reference, delete sequences with missing or additional amino acid sites.Finally, rename the sequences, naming them with 'species_ sequence number'.All data were output in FASTA format.

LCAS selection
The collected ACE2 sequence species were distinguished into two parts: known susceptible species and unknown species.The key amino acid region of the human ACE2 receptor sequence that strongly binds to SARS-CoV-2 was screened from the literature [9,10,15,19,20,34,35].Import the amino acid sequences of known susceptible species into BioEdit [33] and highlight the sites of the key amino acid domains that are screened out.Then paste the highlighted amino acid sites into a new Excel spreadsheet.Finally, using the human ACE2 receptor amino acid sequence as a standard, select the amino acid sites that are completely identical in all known species, which are the least common amino acid sites (LCAS).Documented the finalized LCAS set in an organized format for subsequent analyses.This comprehensive selection of amino acid sites represents the least common denominators across susceptible species, forming a robust foundation for further investigations.

Analysis of potentially susceptible hosts
The ACE2 sequences of unknown species was imported into BioEdit tool and highlighted the LCAS (Least Common Amino acid Sites) sites.The identical pattern of LCAS amino acid sites of known susceptibility were compared and analyzed with unknown species sequence into a new Excel spreadsheet for systematic analysis.Species displayed entirely identical LCAS patterns were categorized as potentially vulnerable hosts; nonidentical sequence species were categorized as non-potential susceptible hosts.
The MEGA11 software adjacency method (Neighbor Joining Method NJ) was used to construct a phylogenetic tree of potentially susceptible hosts.The average distance of each species in the NJ phylogenic tree was constructed between 0 and 1.We perform a bootstrap test with 1000 replicates to build a phylogenetic tree.

Collection of SARS-CoV-2 susceptible hosts
The list of animals infected with SARS-CoV-2 was collected from WOAH reports and literature.The results reveal that a total of 63 species were infected with SARS-CoV-2, including 38 species from 16 families that were infected from natural sources (Table 1) and 25 species from 12 families that were infected under experimental conditions (Table 2).Known susceptibility host statistics (Fig 1).

Collection of the ACE2 receptor sequence
We collected 407 ACE2 protein receptor sequences from various species from the Uniprot database.We scrutinized 86 complete ACE2 protein sequences after eliminating incomplete and duplicate sequences.In addition, we obtained 23 complete ACE2 protein sequences from

Processing of ACE2 receptor data
We classified 109 ACE2 receptor sequences by dividing them into two groups: the known vulnerable hosts group (29 species in 10 families) and the unknown susceptible hosts group (80 species in 35 families) (Tables 1 and 2).We screened 29 species of ACE2 receptor sequences from 109 as known to be sensitive to SARS-CoV-2.The key regions of the ACE2 receptor sequence in the human ACE2 receptor have been selected for further study (Table 4).

Screening of LCAP
The key regions of the ACE2 receptor sequence in the human ACE2 receptor was compared to the known susceptible to SARS-CoV-2 (Table 4).As a result of the comparison, the 10 most common amino acid sites-

Analysis of potentially susceptible hosts
In this study, ACE sequences from 80 unknown species were compared to 10 LCAS, and their similarity pattern was examined.The ACE2 receptor sequences of 49 species across25 families were entirely similar to the 10 LCAS of known sensitive species, suggesting their potential susceptibility to SARS-CoV-2 (Table 5).Thirty-one species from 21 families were considered non-potential susceptible hosts because they were not related to the 10 LCAS (Table 6).Potential susceptible hosts are primarily located in the orders Primates, Carnivora, Rodentia, and Artiodactyla, indicating that closely related animals are more likely to be infected with the novel coronavirus.It illustrates the evolutionary links between potentially susceptible risk hosts (Fig 3).

Discussion
We performed a comparative analysis of the ACE2 receptor-specific protein sequences of 109 species.The important 10 key amino acid sites that were commonly located in known SARS-CoV-2 susceptible species as reference standards for the analysis and used them to identify the potential risk host.The results reveal that 49 species were potentially susceptible hosts, and 31 species were non-susceptible hosts.Most of the potential susceptible hosts are distributed in the same order as the known susceptible hosts, indicating to some extent that closely related species are more susceptible.Particularly, two target species (Manis pentadactyla and Manis javanica), which appeared in the prediction results, have not been reported before.This indicates that while focusing on closely related species, it is necessary to pay attention to other target species and protect animals on a larger scale.The rising number of wild and domestic animals infected with SARS-CoV-2 challenges us to rethink outbreak control strategies in the post-epidemic era and prepare for future emerging infectious diseases.However, not all closely related species are potentially susceptible.The key amino acids at position 41 of the ACE2 receptors in Capuchinidae, night monkeys, and marmosets differ from those in humans.A large number of studies have confirmed that 41-position amino acid mutations may break key hydrogen bonds, reducing the binding capacity of SARS-CoV-2 to ACE2 [17,51]; Bats are generally considered to be the main natural hosts of the new coronavirus, but the 35 amino acids of Rhinolophus macrotis and Rhinolophus ferrumequinum of the Rhinolophidae family are different from humans [35].The mutations in E35K can reduce the  [52].It suggests that not all bats are susceptible to the new corona virus.Assessing the susceptibility of various bat species to the new coronavirus is the first step in the traceability process for bats, which can significantly reduce the challenges in tracing the new coronavirus.Paguma larvata, which showed inconsistency on LCAS, was not entirely consistent in the predictions, but recent studies have shown that it can be infected with the new coronavirus in vitro [18], which may be related to other factors inherent in the animal.Therefore, further research and analysis is needed on whether civet cats can be naturally infected and spread the new coronavirus.
In this study, a minimum number of key amino acid loci were selected based on the LCAS of known susceptible hosts, which greatly reduces the complexity of the work and allows for rapid and more accurate prediction of potentially susceptible hosts for the new coronavirus.Genetic variations in the host receptor ACE2 may also contribute to susceptibility or resistance against the viral infection, depending on how the variations in spike protein influence the cross-species transmission of the virus.Studies have proved that after genetic mutations in S19, K31, E35, Y41, K68, and D355, the binding capacity of the virus to the receptor decreases [34,35].The predicted results are almost consistent with the results of other studies [26], indicating the accuracy of the results.The predicted results are almost consistent with results of other studies [2], indicating the accuracy of the results.This method is simple and accurate, which can provide ideas to predicting the potential susceptible hosts in the early stages of disease outbreaks.It supports protective preventive measures for potential hosts in advance to control future outbreaks and reduce animal infections.The constant mutation of coronavirus increases its ability to bind to the ACE2 receptor as well as resist the immune response [53].For example, N501Y can form a new interaction with the ACE2 receptor Y41, and it is widely  present in mutants [54].Especially the mutated Omicron strain S residue Y501 stacking interaction with the T-shaped π-π of Y41 in the ACE2 residue.The Q493R and Q498R mutations introduce two new salt bridges, such as E35 and E38, respectively replacing hydrogen bond formation and remodeling the electrostatic interactions with the ACE2 receptor of Wuhan-Hu-1 RBD.S477N leads to the formation of new hydrogen bonds between the asparagine side chain and the ACE2 S19 backbone amine and carbonyl groups [53,55,56].These interactions illustrate that key amino acid sites on the ACE2 receptor are important for viral binding.we only considered key amino acid sites of virus-receptor interactions to predict susceptibility.However, the viral entry into host cells and replication were influenced by many other factors, such as cathepsin TMPRSS2 or CTSL1, and ADAM-17 [57].Therefore, key amino acid sites alone are not sufficient.

Conclusions
In summary, we used a simple and accurate method to provide valuable insights into potential hosts at the early stages of disease outbreaks.We predicted 49 species as potentially susceptible hosts and 31 species as non-susceptible hosts.Notably, Manis pentadactyla and Manis javanica species were predicted, emphasizing the importance of considering a broader range of species in outbreak control.The research underscores the significance of genetic variations in the ACE2 receptor and how they influence susceptibility or resistance to viral infection.This information supports proactive preventive measures for potential hosts, aiding in outbreak control and reducing the risk of animal infections.However, it is crucial to acknowledge the study's limitations and emphasize the ongoing need for research and validation to enhance our comprehension of cross-species transmission and preparedness for emerging infectious diseases.The prediction of SARS-CoV-2 infection risk species through key amino acid sites alone are not sufficient.Therefore, a comprehensive approach involving surveillance, laboratory validation, and clinical observation is essential to confirm the predicted potential susceptibility of https://doi.org/10.1371/journal.pone.0293441.t005animals to SARS-CoV-2 infection, crucial steps for controlling future outbreaks and contributing to a more nuanced understanding of cross-species transmission dynamics.

Fig 3 .
Fig 3. (a) The MEGA11 module calculates the IQ-TREE optimal model to build a phylogenetic tree.iTOL shows the percentage of the total number of species in the outer circle by order, including proportion, and the number of species in the inner circle by family.(b) Shows the number of species in each order in a two-dimensional bar chart.(c) Percentage of animal species in the classification orders potential risk for COVID 19.https://doi.org/10.1371/journal.pone.0293441.g003

Table 3 .
(Continued) binding capacity of SARS-CoV-2.Jun Lan et.al. found that ACE2 of Rhinolophus ferrumequinum cannot mediate the entry of the new coronavirus